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ABSTRACT 

Data-management-as-a-service systems are increasingly be- 
ing used in collaborative settings, where multiple users ac- 
cess common datasets. Cloud providers have the choice to 
implement various optimizations, such as indexing or ma- 
terialized views, to accelerate queries over these datasets. 
Each optimization carries a cost and may benefit multiple 
users. This creates a major challenge: how to select which 
optimizations to perform and how to share their cost among 
users. The problem is especially challenging when users are 
selfish and will only report their true values for different 
optimizations if doing so maximizes their utility. 

In this paper, we present a new approach for selecting 
and pricing shared optimizations by using Mechanism De- 
sign. We first show how to apply the Shapley Value Mech- 
anism to the simple case of selecting and pricing additive 
optimizations, assuming an offline game where all users ac- 
cess the service for the same time-period. Second, we extend 
the approach to online scenarios where users come and go. 
Finally, we consider the case of substitutive optimizations. 

We show analytically that our mechanisms induce truth- 
fulness and recover the optimization costs. We also show ex- 
perimentally that our mechanisms yield higher utility than 
the state-of-the-art approach based on regret accumulation. 

1. INTRODUCTION 

Over the past several years, cloud computing has emerged 
as an important new paradigm for building and using soft- 
ware systems. Multiple vendors offer cloud computing in- 
frastructures, platforms, and software systems including 
Amazon [3], Microsoft [10], Google [20], Salesforce [35], and 
others. As part of their services, cloud providers now offer 
data-management-in-the-cloud options ranging from highly- 
scalable systems with simplified query interfaces {e.g., Win- 
dows Azure Storage [11], Amazon SimpleDB [9], Google App 
Engine Datastore [21]), to smaller-scale but fully relational 
systems (SQL Azure [26], Amazon RDS [6]), to data inten- 
sive scalable computing systems (Amazon Elastic MapRe- 
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duce [4]), to highly-scalable unstructured data stores (Ama- 
zon S3 [8]), and to systems that focus on small-scale data 
integration (Google Fusion Tables [19]). 

Existing data-management-as-a-service systems offer mul- 
tiple options for users to trade-off price and performance, 
which we call generically optimizations. They include views 
and indexes {e.g., users can create them in SQL Azure and 
Amazon RDS), but also the choice of physical location of 
data -which affects latency and price {e.g., Amazon S3)- 
how data is partitioned {e.g., Amazon SimpleDB data "do- 
mains"), and the degree of data replication {e.g., Amazon 
S3 standard and reduced-redundancy storage). Cloud sys- 
tems have an incentive to enable all the right optimizations, 
because this increases their customer's satisfaction and can 
also optimize the cloud's overall performance. 

Today, data owners most commonly pay all costs asso- 
ciated with hosting and querying their data, whether by 
themselves or by others. Data owners also choose, when pos- 
sible, the optimizations that should be applied to their data. 
However, there is a growing trend toward letting users col- 
laborate with each other by sharing data and splitting data 
access costs. For example, in the Amazon S3 storage service, 
users can currently share their data with select other users, 
with each user paying his or her own data access charges [7]. 

The combination of data sharing and optimizations cre- 
ates a major challenge: how to select the optimizations to 
implement and how to price them when one optimization 
can benefit multiple users. Implementing these optimiza- 
tions imposes a cost on the cloud that needs to be recovered: 
resources spent on implementing and maintaining optimiza- 
tions are resources that cannot be sold for query processing. 

A recently-proposed approach by Kantere, Dash, et 
al., [16, 22] addresses this problem by asking users to in- 
dicate their willingness to pay for different query perfor- 
mance values, observing the query workload, and deciding 
on the optimizations to implement based on optimizations 
that would have been helpful in the past {i.e., based on 
regret). The cost of the implemented optimizations is amor- 
tized over the future queries that make use of them. This 
approach, however, has two key limitations as we show in 
Section 8. First, it assumes that users in the cloud will 
truthfully reveal their valuations. In practice, users will 
try to game the system if doing so improves their own util- 
ity. Other collaborative systems like peer-to-peer networks 
experience widespread gaming [2] that can degrade system 
performance [17], and incentives to reduce gaming are core 
components of modern peer-to-peer clients [15]. Second, this 
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approach does not guarantee that the cost of an optimiza- 
tion will be recovered. 

Given these two observations, we develop a new approach 
to select and price optimizations in the cloud based on Mech- 
anism Design [31, 33]. Mechanism Design is an area of game 
theory whose goal is to choose a game structure and pay- 
ment scheme such as to obtain the best possible outcome 
to an optimization problem in spite of selfish players having 
to provide some input to the optimization. Our goal is to 
enable the cloud to find the best configuration of optimiza- 
tions. For this, it needs users (i.e., selfish players) to reveal 
their valuations for these optimizations. 

The most closely related approaches from the Mechanism 
Design literature are cost-sharing mechanisms [27] . Given a 
service with some cost, these mechanisms decide what users 
to service and how much the users should pay for the service. 
We show how to easily adapt this technique from the game 
theory community to the simplest problem of pricing a single 
optimization when all users access the system for a single 
time-period {i.e., offiine games). 

The problem of pricing optimizations in the cloud, how- 
ever, raises two additional challenges. First, in the cloud, 
users change their workloads as well as join and leave the 
system at any time. Such dynamism complicates the prob- 
lem because the choice and price of optimizations must vary 
over time {i.e., we need an online mechanism), and users now 
have new ways of gaming the system: they can lie about the 
time when they need an optimization and they can emulate 
multiple users. Second, multiple optimizations are available 
in the cloud and the value that a user derives from these 
optimizations can be given by a complex function. In par- 
ticular, in this paper, we consider additive, or independent, 
optimizations and substitutive, or equivalent, optimizations. 

We seek the following standard properties for our mech- 
anisms. First, we want the mechanisms to be truthful, also 
known as strategy-proof [31], which means that every player 
should have an incentive to reveal her true value obtained 
from each optimization. The approach by Dash, Kantere et 
al. [16, 22] mentioned above is not truthful as we discuss 
in Section 8: users can benefit from lying about their value 
for an optimization. We also want online mechanisms to 
be resilient to multiple identities and to misrepresentation 
of the time when a user needs an optimization. Second, 
we want the mechanisms to be cost-recovering, which means 
that the cloud should not lose money from performing the 
optimizations. In the approach by Dash, Kantere, et al. [16, 
22], the cloud first decides to implement an optimization 
and then the cost is amortized over the future queries that 
use it. Cost-recovery is thus not guaranteed. Finally, we 
want the mechanisms to be efficient, also known as value- 
maximizing [31], which means that we want it to maximize 
the total social utility of the system i.e., the sum of user val- 
ues minus the cost of the implemented optimizations. For 
example, if several users could benefit from an expensive 
optimization that none of them can afford to pay for in- 
dividually, then the cloud should perform the optimization 
and divide the cost among the users. 

In summary, we make the following four contributions: 

We first show how the problem of pricing optimizations 
maps onto a cost-recovery mechanism design problem (Sec- 
tion 3). We also show how the Shapley Value Mecha- 
nism [27], which is known to be both cost- recovering and 
truthful, solves the problem of pricing a single optimization. 



We propose a direct extension of the mechanism to the case 
of additive optimizations in an offiine scenario, where all 
users access the system for the same time-period. We call 
this basic mechanism Add " Mechanism (Section 4). 

Second, we present a novel mechanism for the online sce- 
nario where users come and go, called the Add Mecha- 
nism. It turns out to be much more difficult to design mech- 
anisms for the online setting: algorithms that are truthful 
or cost-recovering in the static setting cease to be so in the 
dynamic setting (see [31, p. 412]). We prove our new mecha- 
nism to be both cost-recovering and truthful in the dynamic 
setting (Section 5). 

Third, we extend both the Add Mechanism and the 
Add*"'" Mechanism to the case where optimizations are 
inter-dependent, or substitutive. We call these mechanisms 
Subst ■" Mechanism and Subst Mechanism and prove 
them truthful (assuming users do not know other users' val- 
uations) and cost-recovering (Section 6). 

It has been proven before that achieving both truthfulness 
and cost-recovery, in the face of selfish agents, comes at the 
expense of total utility [27]. We experimentally compare 
our mechanisms against the state-of-the art approach based 
on regret accumulation [16] and show that our mechanisms 
produce up to a 3x higher utility and provide the same 
utility for ranges of optimization costs up to 12.5 x higher 
than the state-of-the-art approach in addition to handling 
selfish users and ensuring that the cloud recovers all costs. 

2. MOTIVATING USE-CASE 

An important component of the astronomy research con- 
ducted by our colleagues in the astronomy department at 
the University of Washington involves large universe simu- 
lations [23] , where the universe is modeled as a set of parti- 
cles, which include dark matter, gas, and stars. All particles 
are points in a 3D space with properties that include posi- 
tion, mass, and velocity. Every few simulation time steps, 
the simulator outputs a snapshot of the state of the uni- 
verse capturing all properties of all particles at the time of 
the snapshot. State of the art simulations {e.g., Springel et 
al. [37]) use over 10 billion particles producing a dataset of 
over 200 GB per snapshot. 

For each snapshot, astronomers first run a clustering al- 
gorithm to detect clusters, called halos. Some halos cor- 
respond to galaxies. Studying the evolution of these halos 
over time is a major component of their research. Different 
astronomers research different types of halos. In particular, 
our colleague indicated that: "There are in general three or 
four different halo mass ranges that different people focus 
on: high mass which corresponds to a cluster, Milky Way 
mass, slightly less than Milky Way mass and low mass/dwarf 
galaxies. [...] For example, I've been looking for Milky Way 
Mass galaxies, but another person in our group might be 
interested in the same sort of galaxies, but at a lower mass 
range. [The simulation] also helps us identify what environ- 
ment a given halo forms in - one person might be interested 
in a Milky Way mass galaxy that forms in relative isola- 
tion, another person might be interested in finding a Milky 
Way mass galaxy that forms near many other galaxies (a 
rich, cluster-like environment)." [25]. Additionally, different 
scientists focus on different particle types and on the simula- 
tion time steps that correspond to interesting time-periods 
in the evolution of the halos that they study [25] . Thus, dif- 
ferent users may need different optimizations (indexes and 
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materialized views for this use-case), and the challenge is to 
decide which ones to implement, and who pays for them. 

In Section 7.2, we evaluate our mechanisms on real data 
and queries (optimized using materialized views) from this 
use-case. Since different scientists query different parts of 
the data, they benefit from different materialized views. 

3. A MECHANISM DESIGN PROBLEM 

In this section, we show how to model the problem of 
selecting and pricing optimizations in the cloud as a mecha- 
nism design [3f ] problem. We further show that our problem 
requires a type of mechanism called cost-sharing mechanism. 
In this paper, we assume that every optimization is binary, 
i.e., the cloud either implements it or not. We do not con- 
sider continuous optimizations {e.g., degree of replication). 

We consider a set of users, I = {1, . . . , m}, who are using a 
cloud service provider {a.k.a., cloud) to access and query sev- 
eral datasets. Any user can potentially access any dataset. 
Let J = {1, . . . , n} be the set of all potential optimizations 
that the cloud offers for these datasets. For example, j may 
represent an index; or the fact that a dataset is replicated 
in another data center; or may be an expensive fuzzy join 
between two popular public datasets, which is precomputcd 
and stored as a materialized view. Upon deciding to do an 
optimization j, the cloud may restrict access to j to only 
certain users; a grant pair {i,j) indicates that user i has 
been granted permission to use the optimization j. While 
grant permissions artificially prevent a user from accessing 
an optimization, this restriction is required to ensure that 
users reveal their true value for an optimization and pay ac- 
cordingly. A configuration, also called alternative, is a set of 
optimizations j and a set of grant pairs^ ihj)- We denote 
an alternative with a and the set of all possible alternatives 
with A. We also denote Sj = {i \ {i,j) G a} to be the users 
who get access to the optimization j in alternative a. 

The goal of the mechanism will be to select a configuration 
a G A. The decision will be based on the optimization costs 
and their values to users, which will determine the users' 
willingness to pay for various optimizations. 

Values to Users. Each user i obtains a certain value 
Vij > from each optimization j: e.g., monetary savings 
obtained from faster execution or the ability to do a more 
complex data analysis. When multiple optimizations are 
performed, the total value to a user is given by Vi{a) > 0, 
and is obtained by aggregating the values Vij for all grant 
pairs {i,j) G a. In this and the following two sections, we 
consider additive optimizations, where the value is given by: 



V^{a)= J2 



> 



(1) 



(i,j)ea 



We consider substitutive optimizations in Section 6. 

An important assumption in mechanism design is that 
users try to lie about their true values: when asked for their 
value Vij, user i replies with a bid bij, where bij may be 
different from Vij . In the case of an additive value function, 
we denote Bi{a) = "^^a j)ea^ii' where Bi{a) is user i's bid 
about her value Vi{a). 

Cost to the Cloud. For each implemented optimiza- 
tion j G J, the cloud incurs an optimization cost Cj > 0, 



which includes the initial cost of implementing the optimiza- 
tion {e.g., building an index) and any possible maintenance 
costs {e.g., updating the index) for the duration of the ser- 
vice. This cost is an opportunity cost: the resources used 
to perform the optimization cannot be sold to other users. 
The cost of an alternative a is then given by: 



C{a)=J2^'^ 



(2) 



jea 



Even if each cost Cj is small, the combined cost C'{a) may 
be large since the number of potential optimizations is large. 

Payments. Once an outcome a is determined, each user 
i who is granted access to an optimization j must pay some 
amount pij. This payment is called the user's cost-share, 
and is determined based on all users' bids^, {bij)i=i^rn;j^i,n. 
If Pi = X^ Pij is the total payment for user i, her utility 
is defined as Ui{a) = Vi{a) — Pi. A standard assumption 
in Mechanism Design is that users are "utility maximizers" , 
i.e., they bid to maximize their utility [31, 33]. 

Cost-Sharing Mechanism Design Problem. After 
collecting all bids, the mechanism chooses an outcome ao G 
A that optimizes some global value function. In the case of 
cloud-based optimizations, we will aim to optimize the total 
social utility ("total utility" for short): the outcome's total 
value (Eq. 1) minus the outcome's cost (Eq. 2). Formally, 
the mechanism chooses the following outcome ao: 



aeA 



ao =argmax j \,Bi{a) — C{a 



(3) 



Such a mechanism is called efficient [27]. Note that the 
mechanism does not know the true values Vi{a), but uses 
the bids Bi{a) instead. The goal of mechanism design is to 
define the payment functions pij so that all users have an 
incentive to bid their true values Bi = Vi. A mechanism 
is called strategy-proof [31, 33], or truthful, if no user can 
improve her utility Ui{a) by bidding untruthfully, i.e., with 
Bi ^Vi. Truthful mechanisms are highly desirable, because 
when users reveal their true values, the mechanism is in a 
better position to select the optimal alternative. 

Another desired property for cost-sharing mechanisms is 
to be cost-recovering, i.e., to only pick outcomes ao so that: 



C(ao)<^P, 



(4) 



^We assume that, if an alternative contains a grant pair 
{i,j), then it also contains the optimization j. 



Example 1. Consider a naive mechanism: The cloud 
collects all bids bij; if Cj < "^^i^iji *^ performs the opti- 
mization j and asks each user to pay bij (pij = bij). Clearly 
it is cost-recovering. However, it is not truthful: a user i can 
lie and declare a much lower value bij ^ Vij , hoping that the 
optimization would be performed anyway and she would end 
up paying much less than her true value. The challenge in 
designing any mechanism is to ensure its truthfulness. 

Formally, a mechanism is defined as follows: 

Definition 1. A mechanism {f,Pi,--- , Pm) consists of 
a function f : (K^)™ — > A (called social choice function^ 
and a vector of payment functions Pi, • • • , Pm, where Pi : 

(E ) — > E js the amount that user i pays. 

^This is a very important point: the payment depends not 
only on the outcome a, but on all bids. For e.g., in the 
second bidders' auction, the winner's payment is the second 
highest bid [33]. 
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Symbol Description 

i,j,t,a Index for users, optimizations, time-slots and outcomes. 

I, J, T, A Sets of users, optimizations, time-slots and outcomes. 

Sj(t) Users serviced by optimization j at time t. 

CSj(t) All users serviced by optimization j up until time t. 

Vij(t) User z's true (private) value for optimization j at time t. 

bij(t) User z's stated value for optimization j at time t. 

Bi Bi = (6.,),=i,„;, = i,„. 

Vi(a) User z's total, true (private) value for outcome a. 

Bi(a) User z's total, stated (public) value for outcome a. 

Pij User z's payment for optimization j. 

P; User z's total payment. 

Ui(a) User z's utility for outcome a. 

C(a), Cj Outcome a's cost, and optimization j's cost, respectively. 

Si Slot when user i enters the system. 

e; Slot when user i pays and leaves the system. 

Table 1: Symbol Table. For symbols with the argument 
time t, we drop t for offline mechanisms. 

The mechanism works as follows. After collecting bids 
Bi, . . . , Bm from all users"^, it chooses the alternative a — 
f{Bi, . . . , Bm) where each user i must pay Pi{B\, . . . , Bm)- 

While we would like to design mechanisms that maximize 
the total utility (Eq.(3)), it is a proven result that one cannot 
achieve cost- recovery {a.k.a. budget-balance), truthfulness 
and efficiency [27] simultaneously. In our setting, we ensure 
only truthfulness and cost-recovery (Eq.(4)) at the expense 
of some efficiency loss. Indeed, if the cloud cannot recover 
its cost, it will not implement the loss-making optimization. 

4. A MECHANISM FOR STATIC COLLAB- 
ORATIONS 

We now show how to use the Shapley Value Mecha- 
nism [27], which has many desirable properties, to solve the 
problem of selecting and pricing additive optimizations for 
one time-slot {i.e., offline games). We extend it to online 
settings, where users come and go across multiple time-slots 
in Section 5 and to substitutive optimizations in Section 6. 
For ease of reference, we summarize the notations used in 
this paper in Table 1. 

4.1 Background: Shapley Value Mechanism 

We start by reviewing the Shapley Value Mechanism [27] , 
shown in Mechanism I. Fix a single optimization j, let 
Cj be its cost and bij, . . . , bmj the users' bids for this op- 
timization. Mechanism I determines whether to perform 
the optimization or not, and, computes the set of serviced 
users Sj C {1, . . . , m}, and how much they have to pay, pij. 
Intuitively, it finds the minimum price p to charge to each 
user who bid more than p such that the total payment is 
at least Cj. It starts by setting Sj to the set of all users, 
and divides the cost Cj evenly among them: p = Cj/\Sj\. 
If p is larger than a user's bid bij, she is removed from Sj 
and a new price is recomputed by dividing the cost evenly 
among the remaining users. As a result, the cost per user, 
Cj/\Sj\, may increase and additional users may need to be 
removed from the set Sj. The process continues until ei- 
ther no users remain or no further users need to be removed 
from Sj. Each serviced user i £ Sj pays the same amount 
Pij ~ C'i/l'S'iJj each non-serviced user i ^ Sj pays nothing, 
i.e., Pij = 0. If Sj — 0, no subset of users has bid enough to 
pay for the optimization, and it is not implemented at all. 
It is obvious that this mechanism is cost-recovering, since 
X^igs Pii "= Cj- The mechanism has also been proven to 
be truthful [27]: if the user i bids the true value bij = Vij, 



Mechanism 1 Shapley Value Mechanism: Computes the 
users serviced by an optimization j, and their cost-share pij. 



Input: Optimization cost Cj; bids bij 
Output: Serviced users Sj; cost shares pij, . 



S, 



{1, 



1 Pmj 



, m} I* The set of serviced users */ 



Each bid Bi is a function A 



repeat 

p <— Tgij- /* Divide cost evenly */ 

5j -s— {j I i S Sj,p < bij} /* Users still virilling to pay */ 
until Sj remains unchanged, or Sj = 

Pij <— p ii i & Sj /* Serviced users pay the same amount */ 
Pij ^^ if i ^ Sj. /* Non-serviced users do not pay */ 
return {Sj,{pij)i^i^m) 



her utility (which is Vij — pij if i G Sj, and otherwise) 
is no smaller than her utility under any other bid. Indeed, 
if she underbids, i.e., bij < Vij; two cases are possible. If 
bij < Cj/\Sj\, Mechanism 1 removes her from Sj and finds a 
smaller set of serviced users Sj that excludes her: thus, her 
utility drops to 0. Else she continues to belong to Sj, so her 
payment pij and her utility remain unchanged. Hence, she 
cannot increase her utility by underbidding. The reader may 
check that overbidding can not improve her utility either. 

4.2 Add^*^ Mechanism 

We now propose our first mechanism for cloud optimiza- 
tion, under the simplest setting, when the optimizations are 
done offline and are additive; we remove these restrictions 
in the next sections. Our mechanism, called Add , iter- 
ates over J and runs the Shapley Value Mechanism for each 
optimization. It adds to a, the grant pairs for all serviced 
users, and it implements the optimization j when the set Sj 
is not empty. Each user pays the sum of all per-optimization 
payments. Since Add runs the Shapley Value Mechanism, 
independently, for each optimization, it follows directly that 
it remains truthful and cost-recovering, as the latter. 

Even though no mechanism can be truthful, cost- 
recovering and efficient simultaneously, the Shapley Value 
mechanism has the important property of minimizing util- 
ity lost due to the cost- recovery constraint [27]. We show, 
in Section 7, how this leads to high utilities even in the face 
of selfish users compared to existing pricing techniques. 

5. A MECHANISM FOR DYNAMIC COL- 
LABORATIONS 

The simple offline mechanism in the previous section is in- 
sufficient for optimizations in the cloud, because cloud users 
change over time. In this section, we develop a new on- 
line mechanism for pricing cloud optimizations, where users 
may join and leave the system at any time. In general, a 
truthful offline mechanism may no longer be truthful in an 
online setting [31, p. 412]; similarly, applying an offline cost- 
recovering mechanism to an online setting may render it non 
cost-recovering. Our new mechanism is specifically designed 
for an online setting, and we prove that it is both truthful 
and cost-recovering. We continue to restrict our discussion 
to additive optimizations (we drop this assumption in the 
next section), and therefore, without loss of generality, we 
discuss the mechanism assuming a single optimization j. 

An optimization's cost has two components: an initial 
implementation cost {e.g., building an index) and a main- 
tenance cost {i.e., cost of index storage and index main- 
tenance). To avoid oscillations where users can afford the 
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initial implementation cost but not its maintenance cost, 
we propose an approach where the cloud computes a single, 
fixed cost Cj, for each optimization j. This cost captures 
both the initial implementation cost and the maintenance 
cost for some extended period of time T {e.g., a month). 
Users may join and leave at anytime during T. However, 
at the end of this time-period, the optimization's cost is re- 
computed and all interested users must purchase it again. 

5.1 Add°" Mechanism 

We first explain how we model the time T. We divide T 
into time-slots numbered 1 . . . z where a slot is the smallest 
time interval for which a user can buy the service. If T is 
a month, slots could correspond to hours, days or weeks. 
The value for user i is a tuple 6ij — [si,ei,Vij). Here, Si 
is the slot when she enters the system {e.g., by opening an 
account) and e^ is the slot when she leaves the system. Vij {t) 
is the function over the slots 1 . . . z such that: at each slot 
t G [si,ei], if user i gets access to the optimization j, she 
obtains the value Vij{t); else she obtains a value of 0. We 
assume that if i < Si or i > a, Vij{t) = 0. Vij{t) can be an 
arbitrary non-negative function and may be such that user i 
only uses the optimization for a subset of the slots in [s;, e;]. 

Users bid for the optimization j, by declaring their values 
as 9ij — {si, ei, bij), where bij{t) is a function of time over the 
interval t € [si,ei]. The cloud collects the bids at each slot 
t £ [l,z]: a bid cannot be retroactive {si < t), but users are 
allowed to revise their future bids {bij{t'), t' > t) upwards'*. 
For example, at time t = 1, let user 1 bid (1, 3, [10, 10, 10]), 
meaning foij(l) = bij{2) — 6ij(3) = 10; at time t — 2 she 
may revise her bids as 6ij(2) — 20, &i-,(3) = 10. For each 
time-slot t, the cloud needs to determine the set of serviced 
users Sj{t), based on the current bids. When a user i leaves 
the system at time d, she has to pay a certain amount pij. 

Example 2. Consider an optimization j with cost Cj — 
100, and two users with values: 6\j — (1, 1, [101]), ^2j = 
(1,2, [26,26]). Thus, user 1 obtains a value of 101 at t = 1 
if she can access the optimization; user 2 obtains a value 26 
at each of the times t = 1,2, if she can access the optimiza- 
tion. Consider the following naive adaptation of the Shapley 
Value Mechanism to a dynamic setting. Run the mechanism 
at each time-slot, until it decides to implement the optimiza- 
tion: at that point the cloud has recovered the cost, and will 
continue to offer the optimization for free to new users. In 
our example, the optimization will be performed at t — 1, 
each user will pay 50, and 52 — 50 = 2 will be user 2's util- 
ity. The problem is that the mechanism is not truthful: user 
2 may cheat by bidding (2,2, [26]). That is, if user 2 hides 
her value during the first slot, user 1 would pay the entire 
cost of the optimization, at t = 1, and user 2 would get a 
free ride at t = 2, obtaining a higher utility of 26 — — 26. 

Our mechanism addresses the challenge outlined in the 
above example. Mechanism 2 shows the detailed pseudo- 
code. Intuitively, it works as follows: First, it runs the 
Shapley- Value Mechanism at each slot t using the residual 
bid '^T>tbij{T) for each user i (line 7). The residual bid 
captures the remaining value that each user would achieve 
if the optimization were implemented at the current slot 
t. This process repeats until the mechanism reaches a slot 
with a high enough value in the residual bids to implement 



Mechanism 2 Add°" Mechanism: Cost-sharing mecha- 
nism for additive optimizations, for multiple slots. 

Input: Optimization J ; cost Cj; bids (si,ei,bij)i—i^rn- 
Output: Serviced users (S'j(i))t=i,z; payments {pij)i=i,m 



1 
2 
3 

4: 
5: 
6: 
7: 
8: 
9: 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 



CSj{0)^d Pij -(-0,Vi = l,m 
for each time slot t = l,z do 
for each user i = 1 , m do 
if j e CSj{t- 1) then 



»j 



oo /* Force user i to be serviced */ 



else if t > Si then 

else 

end if 



Y2T>t ^iji''') /* Residual value at time t */ 
/* Prune users not yet seen */ 



end for 

/* Update the set of serviced users */ 
{CSj{t), (p^^,),^i,„) ^ Shapley-Mech(C,, (b^),=i,„) 
Sj(t) <— {i I j e CSj(i),t < Ci} I* Service active users */ 
for i = 1 , m do 
if ei = t then 

Pij ^^ p'ii /* User i pays when her bid expires */ 
end if 
end for 
end for 
return ((5j(t))t=i,z, (pij)i=i,m)- 



the optimization. At that time, the optimization is imple- 
mented, the users who could afford it get access to it, and 
an initial cost-share is computed. In subsequent time-slots, 
all previously serviced users continue to be serviced. If a 
new user arrives, the system has two options: allow her to 
pay the previously computed cost-share and access the op- 
timization or recompute a lower cost-share given the extra 
contribution of the new user. We choose the latter approach 
since it minimizes the cost-share and maximizes the num- 
ber of users who get the service. As a result, the per-user 
cost-share decreases as new users join the system and con- 
tribute to the optimization cost. Users actually pay for the 
optimization only when they leave the system at time Ci. At 
that time, they pay the lowest cost-share computed so far. 
Notice that, when a user i pays and leaves, the cost-share 
does not increase for the remaining users since i paid her 
share of the optimization cost. 

More formally, the Add Mechanism computes for each 
time-slot t G [i-,z] the set of serviced users Sj{t) (line 
14), and computes the payment pij (lines 15-19) for each 
user i leaving at time t, using the Shapley- Value mech- 
anism. Denote the cumulative set of serviced users as 
CSj{t) = lJT<t 'S'j (''')• The key modification to the Shapley- 
Value mechanism is to have it operate on CSj [t) rather than 
Sj{t) (line 13). This ensures that all users who have used or 
will use the optimization contribute equally to pay for the 
cost. Once a user is serviced at some time t, i £ Sj{t), all 
her future bid are assumed to be oo (line 5): this ensures 
that the Shapley- Value Mechanism will always include i in 
CSj{t). The users actually serviced, Sj{t), are the active 
users in CSj{t) (line 14). 



Example 3. Let 



the 



As a consequence, a can only increase. 



cost of the optimiza- 
tion be Cj — 100 with four users bidding 
(1, 1, [101]), (1, 3, [16, 16, 16]), (2, 2, [26]), (2, 2, [26]). Then 
CS,{1) = {1}, CSj{2) = {1,2,3,4}, CSj{3) = {1,2,3,4}. 
Note that user 2 is not included in CSj{l) because her bid 
48 is below Cj/2. At time t = 2 her remaining total value 
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is only 32; however, since now there are four users, each 
users' share is Cj/A and therefore all users are included in 
CSj{2), and in CSj{3). Users 1,2,3,4 leave at times t — 1, 
t = 3,t = 2,t — 2 respectively, so they pay 100, 25, 25, 25. 

5.2 Properties 

We prove that Add has three important properties: (1) 
it is resihent to bids with both untruthful values and un- 
truthful times, (2) it is cost-recovering, and (3) although 
users can increase their own utilities by using multiple iden- 
tities, they can not decrease the utility of other users. 

Truthful. The definition of a truthful mechanism in the dy- 
namic setting is more subtle than in the static setting. In a 
static scenario, the mechanism is called truthful if for any set 
of bids, user i cannot obtain more utility by bidding bij 7^ Vij 
than by bidding her true value bij — Vij. In the dynamic 
case, user utilities depend not only on the other bids received 
until now, but also on what will happen in the future. We 
assume the model-free [31] framework to define truthfulness 
in the dynamic case: it assumes that bidders have no knowl- 
edge of the future agents and their preferences. At each time 
t, every agent assumes their worst utility over all future bids, 
and they bid to maximize this worst utility [31]. 

Example 4. Consider Example 3. User 2 bids 

(1,3, [16, 16, 16]), thus she could obtain a value 16 at each 
of the three time-slots t = 1,2, 3; but she is serviced only at 
time-slots t = 2,3, hence her value is 16 + 16 = 32. She 
pays 25, thus her utility is 32 — 25 = 7. Suppose that she 
cheats, by overbidding (1, 3, [17, 17, 17]). Now she is serviced 
at all three time-slots, but still pays only 25 (because when 
she leaves there are four users in CSj). Thus, for the par- 
ticular bids in Example 3, user 2 could improve her utility 
by cheating. In a model-free framework, however, users do 
not know the future, and they must assume the worst case 
scenario. In our example, the worst case utility for user 2 at 
t = 1 (when she places her bid) corresponds to the case when 
no new bids arrive in the future: in this case, if she overbids 
> 50, she ends up paying 50, and her utility is 48 — 50 — —2. 
If she underbids, her worst case utility is still 0. By cheating 
at t = 1, user 3 cannot increase her worst-case utility. 

With the model-free notion of truthfulness, a dynamic 
mechanism is called truthful if, for each user, revealing her 
true preferences maximizes the minimum utility that she can 
receive, over all possible bids by future users. This definition 
of truthfulness reduces to the classic definition of truthful- 
ness for the static case {i.e., with a single time slot). 

Proposition 1. Add"^"" Mechanism is truthful. 

Proof. (Sketch) Consider a user i bidding at time t, i.e., 
her bid is {si,ei,bij) and t < Si (bids cannot be placed 
for the past). We claim that her minimum utility over all 
future users' preferences (at times t + l,t + 2, . . .) is when 
no new bids arrive in the future. Indeed, any new bids 
in the future can only decrease the payment due by user i 
(by increasing the set Sj{ei), hence decreasing her payment 
Pij — Cj/\Sj{ei)\), and can only increase her value at every 
future time slot t' < Si, by including i in a set Sj{t') where it 
was previously not included. Thus, the minimum utility for 
user i is when no new bids arrive after time t. But in that 
case. Add degenerates to one round of the Shapley- Value 
Mechanism, run at time t, which is proven to be truthful. D 



Cost-recovering. Intuitively, Add°" recovers all costs be- 
cause it always applies the Shapley- Value Mechanism to the 
game given by all bids known at the present time. Due to the 
lack of space, we defer the proof to our technical report [41]. 

Multiple Identities. A user could create multiple identi- 
ties and place a separate bid for each identity. If at least 
one identity gets access to the optimization, she obtains her 
full value (by running her queries under that identity). How- 
ever, she has to pay on behalf of all identities. It turns out 
that a user can increase her utility this way: by creating 
more identities, she could help more users to be serviced 
and thus decrease her total payment. For example, consider 
an optimization that costs Cj = 101 and a user Alice whose 
value is (1, 1, [101]). Suppose there are 99 other users whose 
values are (1, 1, [1]). Of the 100 users, only Alice is serviced, 
because even if all the other 99 users were serviced, each 
would be paying 101/100 = 1.01, which would exceed their 
value of 1. However, if Alice creates two identities, each 
bidding (say) (1,1, [101]), Add°" would see 101 users and 
would serve all of them with each of the 99 users paying 
101/101 = 1, while Alice would pay 2, once for each iden- 
tity. Thus, her utility would increase from 101 — 101 = to 
101 — 2 = 99. Add does not prevent such ways of gaming 
the system, because they are indistinguishable from collab- 
orations. For example, instead of cheating, Alice could ask 
Bob (whose value is at least 1) to participate in the game, 
then reimburse Bob for his payment: this is indistinguish- 
able from creating a fake identity. On the other hand, this is 
not undesirable: through her action, she caused more users 
to be serviced, while agreeing to pay a bit more than the 
other users' shares. We can prove that this holds in general. 



Proposition 2. Suppose a user i can increase her util- 
ity under Add''-'^ or Add''''" by creating multiple identities 
Ji, 12, . . . Then no other users' utility decreases. 

Proof. (Sketch) Consider two games, one with user i 
with a single account and one with user i creating k identi- 
ties ii, . . . ,ik and associated bids. Her utility can increase by 
creating dummy identities only if the total payment by the 
dummies is less than the total payment without the dum- 
mies. Let user i's payment with no dummies be pi and 
the total payment of her dummies be p'^. Since creating 
dummies increases i's utility p'^ < Pi, and the payment per 
dummy (which would be the payment per user as well with 
the dummy accounts) is p'^/k < p^ < pi. Thus, for all users 
served in the game with no dummies are surely served with 
dummies too since the payment per user is lower than with- 
out the dummies. Hence the utility of no user decreases. D 

6. MECHANISMS FOR SUBSTITUTABLE 
OPTIMIZATIONS 

In this section, we relax the requirement that optimiza- 
tions be independent. Indeed, when multiple optimizations 
{e.g., indexes or materialized views) exist, the value to the 
user from a set of optimizations can be a complex combina- 
tion of the individual optimization values. In this section, we 
consider the case of substitutable optimizations. Formally, 
each user defines a set of substitutable optimizations J^ C J 
such that Vj, k £ Ji : Vij — vn, = Vi > Q. Additionally, 
given an outcome a, Vi{a) = Vi if 3j G Ji : {i,j) G a and 
Vi{a) — otherwise. In comparison to the substitutable val- 
uation, the valuation function that we previously used was 
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Mechanism 3 Subst*^*^ Mechanism: Cost-sharing mech- 
anism for substitutable optimizations for a single slot. 

Input: Opts. J; costs (Cj)j=i,„; bids {bij)i^i^rn;j=i,n 
Output: Alternative a £ A; cost shares {pij)i=i,m:j—l,n 
a -(^ % Pij *^ 0,'ii = l,m Vj = 1, n 
loop 

for each optimization j in J do 

/* Compute serviced users, discard payments */ 
(•S'j: (Pij)i=l,m) <- Shapley-Mech(Cj, (feij)i=i,m) 
end for 

/* Find the smallest cost-share optimization */ 
Jf ^ {j g J I Sj 7^ 0} /* Set of feasible opts */ 
if Jf ^ then 

jrm-n ^ argmin^.gj/ (C-j/\Sj\) 

a <— a U {jmin} /* Perform optimization jmin */ 



for each user i £ 5, 



do 



a<- aU{{i,jmin)} 

Pij^,^ ^C'j^.„/l'S'j„i„l 

bij <— ^j a J /* Remove i from future loops */ 
end for 

^jmin •<— oo /* Remove jmin from future loops */ 
else 

return {a,{pij)i^i^rn;j=l,n) 
end if 
end loop 



the sum: Vi{a) = J^c^ iga^JJ- With substitutable valua- 
tions, a user bid takes the form 6i = {Ji,Vi), where Ji is the 
set of substitutable optimizations and Vi is the user value if 
she is granted access to at least one optimization in Ji. 

Substitutable optimizations capture the case where imple- 
menting any optimization from a set {e.g., indexes, materi- 
alized views, or replication) can speed-up a workload by a 
similar amount and the user does not have any preference as 
to which optimization is responsible for the speed-up. How- 
ever, she gets no added value from multiple optimizations 
being implemented at the same time either because they 
may be redundant [e.g., a materialized view may remove 
the need for a specific index) or because she is indifferent to 
further performance gains. 

6.1 Subst°^ Mechanism 

We first consider the Subst Mechanism for static games 
where all users use the system for the same time period. 

Example 5. Consider three optimizations with costs 
Ci = 60, C2 = 180, and C3 = 100. The bid ({1,2}, 100) 
indicates that the user values the access to either optimiza- 
tion 1 or 2 at 100. Other example bids include ({3}, 101), 
({1,2, 3}, 60), and ({2}, 70), for users {2,3,4}, respectively. 

The challenge with substitutable optimizations is that 
users may bid for partially overlapping sets of optimizations 
as in Example 5. They also have a new way of cheating. In 
addition to lying about their value Vi and emulating multi- 
ple users, they may lie about the optimizations they want by 
either bidding for ones they do not want or by not bidding 
for the ones they do want. Our mechanisms are truthful un- 
der the model-free notion and are also resistant to cheating 
with dummy users under the practical assumption that no 
user knows other users' bids. 

Subst Mechanism (Mechanism 3) works in a sequence 
of phases. In the first phase, it runs the Shapley Value 
mechanism for each optimization j (along with the users 
who bid for j) independently and selects the optimization 



Mechanism 4 Subst°" Mechanism: Cost-sharing mech- 
anism for substitutable optimizations, for multiple slots. 

Input: Opts J; costs (Cj)j=i,„; bids {si,ei,{bij)j^i^ri)i=l,m- 
Output: Serviced users (<S'j(t))t=i,z; payments (pij)i=i.m 
a ^ pij <— 0, Vi = 1, m 
for each time slot t = 1 , 2: do 
for each user i = l,m do 
if 3j £ J. (i,j) S a then 

fe^ . <— 00 /* Force user i to be serviced */ 
6' ., ^ Vj' e J,j' ^ j /* Force i to only use j * / 
else if t > Si then 

Mi ■*" '^T>t ^iji''') /* Remaining value know at t */ 
else 

fe^ . -f- /* Prune users not yet seen */ 
end if 
end for 

/* Update the set of serviced users */ 
{a,p\.) ^ SubstOff(J, (Cj)j^i.„, (6^)i^i,„.j^i,„) 
Sjit)^{i\3j.{i,j)ea,t<e,} 
for j = 1 , m do 
if Si = t then 

Pij <— p^ /* User i pays when her bid expires */ 
end if 
end for 
end for 

return {{Sj{t))j = i^„;t = l,z, (Pij)i = l:m,j = l:n) 



jmin with the lowest cost-share. Users who want jmin and 
can pay its cost-share get access to it. The mechanism then 
recursively applies the algorithm to the remaining users and 
optimizations in subsequent phases. 

Example 6. Consider example 5. Subst''''-" first iden- 
tifies optimization 1 as having the lowest cost-share with 
Si = {1,3} and cost-share ^ = 30, and thus implements 
optimization 1 and services users 1 and 3. Next, Subst '^ 
considers the remaining users {2, 4} and the remaining op- 
timizations {2,3}. For these optimizations, S2 ~ 9 while 
S3 = {2}. Optimization 3 is thus implemented and user 2 is 
given access to it. User 4 gets access to no optimization. 

Due to space constraints we defer the proof that Subst 
is cost-recovering and truthful to our technical report [41]. 
Example 7 provides an intuition for its truthfulness. 

Example 7. Consider example 6. If, to cheat, user 3 
bids any value in the range [30, cx)), the outcome and her 
utility would not change. If she bids below 30, however, she 
would not be serviced by optimization 1 as her bid would be 
below the cost-share. She would not get serviced by any other 
optimization either, because their cost-shares are higher than 
that of optimization 1, which has the lowest cost-share. Her 
utility would be (0 < 30) . Finally, if she, being untruthful, 
does not bid for optimization 1, even though it benefits her, 
and bids ({2, 3}, 60), then both optimization 1 and 2 would 
tie for the lowest cost-share at 60. Assuming that Subst '^ 
makes a random choice and implements optimization 2, then 
she would get access to optimization 2 and would pay the 
cost-share of 60, achieving a strictly lower utility ofO. 

6.2 Subst°" Mechanism 

We now consider substitutable optimizations, but in a dy- 
namic setting where users can join and leave the system in 
any time-slot. Given substitutable optimizations Ji, user i 
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bids uji = (si, ei,bi, Ji), with [si, d] as the requested interval 
of service and bi{t) is the value she gets at time t. 

Subst Mechanism, shown in Mechanism 4, works by 
running Subst at each time-slot t with the residual value 
of all the users seen. The first time a user i is granted access 
to optimization j her bid for j is updated to oo (so that 
she is always in the feasible set of j), while her bids for the 
other optimizations are updated to (so that she remains 
serviced only by optimization j). 

Example 8. Consider three optimizations {1,2,3} with 
costs Ci = 60, C2 = 100, C3 = 50. User 1 bids 
(1,2,100,(1,2}), which is interpreted as follows: she val- 
ues any optimization in {1,2} at 100 for the time-slots 
[1,2]. User 2 bids (2,3,100,(1,2,3}) and user 3 bids 
(3,3,100,(3}). Att = 1, Subst°" runs Subst°^ with user 
1 (the only user at that time) and ends up implementing op- 
timization 1, with a payment of 60. Then, Subst updates 
user 1 's bid to optimization (1} valued at 00. At time t — 2, 
Subst runs Subst " with users (1, 2} and ends up granting 
user 2 access to optimization 1 with the new payments for 
both users being 60/2 = 30. User 1 leaves after paying 30, 
while user 2's bids are updated to optimization (1} valued at 
00. At time t = 3, Subst again executes Subst " with all 
three users (although user 1 left, she is included while invok- 
ing Subst -^j to compute the proper cost-share for user 2), 
and ends up implementing optimization 3, but only for user 
3, at a payment of 50. User 2 is not serviced optimization 3 
since she is already using optimization 1 and Subst does 
not allow her to switch to a new optimization. The system 
ends with user 2 paying 30 and user 3 paying 50. The in- 
ability to switch is crucial for truthfulness: otherwise, a new 
user, say user 4, who prefers optimization (1,3}, arriving 
at time t = 3, might only bid for optimization 3 hoping that 
user 2 would switch to optimization 3. If user 2 could switch, 
each would pay 50/3 = 16.7, while without the switch, user 2 
pays 60/2 = 30 (as before) and users (3,4} pay 50/2 — 25. 

Due to space constraints we defer the proof that Subst 
is truthful and cost-rccovoring to our technical report [41]. 

Multiple Identities. The dummy users can, in theory, in- 
crease their utility at the expense of other users, for substi- 
tutable optimizations, though this is hard to do in practice. 
We illustrate this for Subst^"^, but the conclusions also ap- 
ply to Subst°°. Consider users (1, 2, 3} with single-slot bids 
((1},5), ((1,2}, 2.51), and ((2}, 7) for optimizations (1,2} 
with costs Ci = 6 and C2 ~ 5. With no dummy users, 
optimization 2 is implemented with a payment of 2.5 and 
utilities of 0.01 for user 2 and 4.5 for user 3. If user 1 cre- 
ates two identities 1' and 1" that make a bid of 2.5 each for 
optimization 1, then both optimizations are implemented 
with optimization 1 serving (l',l",2} with utilities of 1, 
0.51, and 2 for users 1, 2, and 3 respectively. Note that user 
3's utility has reduced. However, to cheat, user 1 needed to 
know the number of other users and their bids, which is not 
publicly known in practice. She may try guessing, but in 
the worst case, her guess can lead to a reduction in her util- 
ity [41]. Thus, being truthful is the optimal strategy when 
the user does not know the other bids. 

7. EVALUATION 

Our mechanisms guarantee truthfulness and cost- 
recovery, but they do not optimize for total utility. In 



this section, we empirically evaluate the total utility that 
our solutions provide. We focus on the two online mech- 
anisms {i.e., Add Mechanism and Subst Mechanism) 
and compare them to the state-of-the-art regret-based ap- 
proach (Section 7.1) [16, 22]. The experiments consist of 
both the motivating use-case (Section 2) and simulated sce- 
narios (Sections 7.3 through 7.6). 

7.1 Regret-based Amortization 

Kantere, Dash, et al. [16, 22] proposed a regret-based ap- 
proach (called Regret, henceforth) to select optimizations. 
They developed a detailed economy of the cloud and con- 
sidered detailed query plans for computing regret. In this 
paper, we abstract away and evaluate the performance of the 
core regret-based approach without the surrounding econ- 
omy or plan details. We briefly describe the algorithm. 

The regret for an optimization j at time t, termed Rj{t), 
is defined as the total value that would have been realized, 
over all users, until time t (and excluding time t), had j been 
implemented at i = 0. Formally, Rj{t) = X]T<t X^ig/ ^»i (''')) 
where / is the set of all users and Vij is user i's valuation 
for optimization j. The policy we adopt is the greedy ap- 
proach [31] where the optimization is implemented at that 
time-slot t when Cj < Rjit). For substitutable optimiza- 
tions, once an optimization j is implemented for a user i, 
she stops benefiting from the other optimizations J \ {)} 
and does not contribute to their regret. 

We now explain how Regret sets prices. For ease of 
explanation we assume a single optimization j that Re- 
gret implements at time tr. Users in subsequent time-slots 
can get access to it only after paying a price pj. Regret 
chooses Pj to be the minimum payment such that the to- 
tal payment from future users equals Cj. If no price pj 
can recover the cost, it picks a price that minimizes the 
cloud's loss. Note that Regret uses the residual value in the 
game assuming perfect knowledge of future users' values. If 
Ij(p,tr) = [(*[ X]t>t ^iji't) ^ P}1 is the number of future 
users who would pay p for optimization j, then the cloud- 
loss would be Lj{p,tr) = (cj — plj{p,tr)). The payment 
Pj minimizes this loss, i.e., pj = argminmax(Lj(p, tr),0}. 
(Choose the smallest pj , in case of ties, so that user utilities 
are maximized.) Thus, our price point is the optimal choice 
to minimize the cloud-loss: it gives an upper bound on how 
well Regret would work in practice. The total social utility 
{a.k.a. total utility) for Regret is defined the same way as 
for the mechanisms (Section 3): the total value realized by 
the users for the slots they are serviced minus the imple- 
mented optimizations' costs. The cloud balance is the costs 
of the optimizations minus the total payments by the users. 
A negative balance means that the cloud incurs a loss. 

Our approach thus computes regret the same way as Kan- 
tere, Dash, et al. [22, 16] except that, in their approach, 
users assign values to individual queries. Our approach ag- 
gregates this information and assigns values to workloads 
spanning larger periods of time. 

7.2 Evaluation on the Motivating Use-Case 

The workload from the motivating use-case in Section 2 
traces the evolution of halos over 27 snapshots of a universe 
simulation. Each astronomer starts with a subset of halos, 7, 
in the final snapshot at ^27 and, for each halo <; G 7, she (a) 
computes the halos in each previous snapshot contributing 
the most particles to g, and (b) recursively computes a chain 
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of halos {hf, . . . , /i26' 5) such that ftf contributes the most 
mass to the halo hf,-^ in the next snapshot. Our optimiza- 
tions materialize the following relation for each snapshot: 
(particlelD, haloID) to speed-up the queries. 

We experiment with six users with differing workloads: 
two workloads (in use by the astronomers) trace the evolu- 
tion of halos 71 and 72, respectively, using all 27 snapshots. 
Based on the astronomers' feedback, we define two new users 
for each of 71 and 72: one user uses every 2""^ snapshot while 
the other uses every 4*'' snapshot. This simulates faster, ex- 
ploratory studies of the data. In our experiments, we mea- 
sure the total utility (Sec. 3) for both Add and Regret. 

We take each optimization's cost to be the dollar amount 
of storing the materialized view on a yearly subscription of 
the Amazon EC2 High-Memory Extra Large Instance [5]. 
This yields an average cost of $2.31 per optimization.^ 

We take the money saved, by completing queries earlier, 
to be the value of an optimization (Amazon also charges for 
each hour of use in addition to the subscription fee). For 
the six users, the run-time of their workload without any 
optimizations is 81, 36, 16, 83, 44 and 17 mins. Materializing 
the view on the snapshot 27 saves 44, 18, 8, 39, 23, and 9 min 
which corresponds to monetary savings of 18, 7, 3, 16, 9, and 
4 cents for one execution of the workloads. The other opti- 
mizations reduce run-time by 2.5 min each for a saving of 1 
cent. Since the optimizations affect different queries in the 
workload, we take them to be additive. 

We consider a year-long time-period where each user uses 
the service in multiples of a quarter (3 months). We explore 
all the 10® ways that the group can bid for slots. For each 
alternative, we then vary the total number of executions 
of each user's workload, and we compute the total utility 
achieved by each approach. Figure 1 shows the average and 
the standard deviation of the utilities across the 10® alter- 
natives as we change usage intensity from low (1 workload 
execution/quarter) to medium (1 workload execution/day). 

Compared to the baseline cost, taken to be the total cost 
of executing the workloads without optimizations, Add 
and Regret yield total utilities of 28%-47% and 16%-40% of 
the base line cost, respectively. Since Add ensures that 
users will pay the entire cost, the total utility is exactly the 
amount of money saved by the group; while for Regret, the 
total money users save is the sum of the total utility and the 
unpaid fraction of the cost, i.e. the cloud balance. We add 
this balance to the utility since the total utility includes 
the utility of both the users and the cloud.® Thus, both 
approaches significantly reduce the cost of using the cloud. 

Comparing Add to Regret, we find that Add yields a 
total utility that is 18%-118% higher than Regret, at 90 and 
40 executions per user, respectively. Further, while the cloud 
never makes a loss with Add , loss by Regret can be up to 
a substantial 92% of Regret's utility (at 40 executions). As 
noted before, our outcomes for Regret are an upper bound 
and with more realistic bids Regret is likely to do even worse. 



^We could have used a different instance. We chose this one 
as it was the most similar to our local machine, on which we 
obtained the storage space and query run-time values. 
®In the case of a scientific collaboration, we can also assume 
that one of the researchers pays a public cloud to implement 
the optimization. She then asks the other researchers to pay 
her back. That researcher is then the one who incurs the 
loss. In this case, the total social utility would be the amount 
saved by the entire group of researchers. 
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Figure 1: Operating expenses \vithout optimization and 
total utility (equal to total money saved) by Add'-'" and 
Regret for the astronomy w^orkload on an Amazon EC2 
subscription, as workloads are executed more frequently. 

In practice, users would execute their workloads multiple 
times and datasets are likely to be larger. For example, the 
upcoming NCSA/IBM Blue Waters system [28] can generate 
10 TB to 200 TB per snapshot (as opposed to 4.8 GB per 
snapshot for our experiments). With a 3 to 5 orders of 
magnitude increase in data size, building optimizations and 
executing workloads would be correspondingly costlier, and 
sharing optimizations would lead to proportionately larger 
savings in the order of tens of thousands of dollars. 

7.3 Collaboration Size 

In the remaining sections, we use a variety of simulated 
configurations to explore how our mechanisms and the Re- 
gret approach compare in different settings. In all cases, we 
measure the total utility. 

The first key parameter affecting utility is the cost of op- 
timizations as a proportion of the user values. This ratio 
affects the number of users that are necessary to cover the 
optimizations' cost. In all simulations, we change this pro- 
portion by varying the per-optimization cost along the x-axis 
while keeping the average user values constant. In this sec- 
tion, we measure the utility of both approaches when the 
total number of users available to cover the optimizations' 
cost is either small (small collaborations) or large (large col- 
laborations). For both approaches, users in larger collabo- 
rations can buy costlier optimizations to get higher utilities. 
We experiment with a small group of 6 users and a large 
one with 24 users. We let users pick one service slot, uni- 
formly at random, from 12 slots^. This gives us an expected 
number of users/slot of 0.5 and 2, respectively. 

7.3.1 Additive Optimizations 

We first consider additive optimizations. We only consider 
one optimization since optimizations are independent. 

For small collaborations. Figure 2(a) shows that as we 
move from cheap to costly optimizations, Regret provides 
good total utility, but then quickly leads to cloud loss, fol- 
lowed by negative total utility; while Add never leads to 
cloud loss or negative utilities. Negative utilities by Regret 
imply that the optimization was implemented but it failed to 
provide enough value to justify its implementation. Restrict- 
ing our attention to the costs where Regret yields a positive 
utility. Add*-*" achieves an average total utility 1.43x higher 
than Regret. Further, while Regret leads to cloud loss (curve 
"Regret Balance" in the figure) at a cost of 0.18, even for 
optimizations 7x costlier. Add yields substantial utility 



'^The number 12 was chosen since 2, 3, 4, and 6 divide it 
perfectly and give us a larger space of parameter values to 
experiment with as compared to some other number like 
10 or 15. The other parameter values were chosen to be 
multiples of 12 for ease of understanding. 
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Figure 2: Total utility as a function of optimization cost for different collaboration sizes. Also showing regret balance 
(optimization costs minus user payments). Add and Subst outperform Regret for a large range of optimization 
costs, for both additive and substitutive optimizations, and for both low and high degrees of collaboration amongst 
users. Further, they never incur a loss, while Regret can incur significant loss. Detailed analysis in Section 7.3. 

(taken to be 0.3, 10% of total user value). Regret under- 
performs against Add for two reasons. First, for ciieap 
optimizations tliat sliould be implemented, Regret loses user 
value while building up regret. Second, for costly optimiza- 
tions, Regret suffers a loss and negative total utility since it 
implements the optimization even when the available future 
values is insufficient to recoup the cost. 

For larger collaborations. Figure 2(b) shows that as we 
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move to costlier optimizations. Add provides worse util- 
ity than Regret. Intuitively, Add looses some opportuni- 
ties to implement optimizations because it is more cautious 
than Regret: to avoid losses. Add only implements an 
optimization when it is certain to recoup the costs given 
current information. The benefit of Regret, however, is lim- 
ited: Regret soon starts losing money and leads to negative 
total utility. In fact, only in less than 10% of the range 
where Regret achieves a positive utility ([0,4.92]), does it 
also outperform Add anrf yield no loss. Over the entire 
range of costs in [0, 3.0] the average total utility of Add 
is 0.87 while that of Regret is —0.63. 

For large collaborations. Add utilities sharply decrease 
after a point because when costs increase, the payment per 
user increases super-linearly, since Add prunes out users 
for whom the payments are larger than the value. No users 
are pruned by Regret and thus it sees a linear reduction in 
utilities with increasing costs. 

Interestingly, the range of costs for which Regret makes a 
loss depends on the number of users who bid. It yields a loss 
at a cost of 0.18 for the small group (Figure 2(a)) and 1.80 
for the large one (Figure 2(b)). Thus, without knowing the 
future users, the cloud can not know when to avoid Regret. 

7.3.2 Substitutive Optimizations 

To compare Subst and Regret in the case of substitutive 
optimizations, we consider a scenario with 12 optimizations. 
Each user selects 3 optimizations, uniformly at random, as 
the set of substitutes (Section 7.6 experiments with other 
ratios). Unlike the additive case, the costs of the 12 opti- 
mizations are sampled uniformly from [0, 2c] so that c is the 
average optimization cost: this is to simulate that not all 
substitutes are equally expensive. Thus the x-axes of Fig- 
ures 2(c) and 2(d) are the mean value of the optimizations. 

Compared to the corresponding additive optimizations in 
Figures 2(a) and 2(b), both Subst " and Regret achieve 
lower overall utility. Indeed, with substitutes, each opti- 
mization has fewer users bidding for it and, once an opti- 
mization is implemented, the serviced users no longer pay for 
the other optimizations. Hence, fewer optimizations are im- 
plemented and, in the case of Regret, there are fewer users 
over whom the costs can be amortized. In the scenarios 



(a) More collaboration on the (b) Less collaboration on the 
left. X-axis is the total number left. X-axis is the # of contigu- 
of slots. The users bid for 1 ous slots that each user bids 
slot. for. 

Figure 3: Add vs Regret performance with varying 
degree of collaboration. (Section 7.4) 

shown. Regret yields a loss earlier than in the additive case. 
When averaged over those costs for which Regret yields pos- 
itive utility, Subst yields 1.63x and 3x more utility than 
Regret for group sizes of 24 and 6, respectively. 

7.4 Overlap in Usage 

The second key parameter that affects utility is how the 
user values are distributed across time. We study this pa- 
rameter using a small group of 6 users collaborating on a 
single, additive optimization. We vary the degree of user 
overlap and its manner. First, we repeat the experiment 
from Figure 2(a) while decreasing the total number of slots 
from 12 to 1. Figure 3(a) shows that, with fewer slots 
to sample from and hence with increased overlap amongst 
users. Add generates 0.77 to 2.75 more utility, on average, 
than Regret. Thus, Add^"" gets 25%-91% of the total user 
value (3.0) as additional utility over Regret. Decreasing the 
number of slots, increases the probability that Add finds 
enough value in some slot to justify implementing the opti- 
mization. In contrast, regret accumulation stays unchanged. 

Next, we study what happens when user values are spread 
across an interval rather than being concentrated in a sin- 
gle time-slot. The setup in Figure 3(b) is identical to the 
additive case with the group size of 6 in Figure 2(a) ex- 
cept that instead of bidding for only one slot, users bid as 
{si,Si + d — 1), where d is the duration of the service and 
is varied on the x-axis. Si is chosen uniformly at random 
from 12 slots. Users divide their values, chosen uniformly at 
random from [0, 1), equally among all d time slots in their 
bids. The average extra value that Add generates over 
Regret increases from 0.77 to 0.98. Indeed, as users spread 
their value across multiple time-slots. Add becomes more 
likely to find a single time-slot with sufficient value to justify 
implementing the optimization. 

7.5 Arrival Skew 

We now consider the small collaboration of 6 users bidding 
for a single optimization, where they arrive: (a) uniformly at 
random in one of 12 slots, (b) early following an exponential 
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Figure 4: Add improves while Regret worsens ^vitll 
temporal slcew. Ratios taken witli the utility of Add 
with users clustered early. (Section 7.5) 

distribution with mean 1.2®, (c) late following a distribu- 
tion that is 12 — t with t sampled exponentially with mean 
1.2. Case (b) simulates datasets that become stale, while (c) 
simulates datasets that become popular over time. We look 
at the ratio of the utility in different settings to that of the 
utility of Add°" with early arrivals. Figure 4 shows that to- 
tal utility by Add " improves while that for Regret worsens 
with irregular arrivals. Add outperforms Regret substan- 
tially as user arrival becomes non-uniform (and Regret soon 
starts generating negative utilities). With skew. Add im- 
proves due to increased chances of finding a slot with enough 
value to pay for all costs. For e.g., with Add , early arrivals 
can be 6.7x and 1.8x more efScient that uniform and late, 
respectively. On the other hand. Regret worsens since skew 
increases the chance that more regret is accumulated than 
required^. For e.g., with Regret, at the cost of 0.54, late 
and uniform arrivals have 16% and 40% higher total utility 
than early arrivals, respectively. This points to an interest- 
ing property of the mechanism-design-based approach: the 
approach performs much better as non-uniformity increases. 

7.6 Selectivity of Substitutes 

We now vary the selectivity of the substitutes, that is de- 
fined as the ratio of the number of substitutable optimiza- 
tions to the total number of optimizations. Figures 5(a) 
and 5(b) show the total utility for selectivities of 0.75 and 
0.25, where each user chooses 3 optimizations uniformly at 
random from 4 and 12 optimizations, respectively. The fig- 
ures show that, with more selective users, absolute utilities 
derived by both algorithms decrease. For e.g.. Regret goes 
from a utility of 1.10 to -0.23 while Subst goes from 2.38 to 
1.90 for the optimization cost of 0.36 as selectivity increases. 
Indeed, with more selective users, the number of users per 
optimization decreases and more optimizations have to be 
be implemented to satisfy the users. For Figures 5(a) and 
5(b), Subst " yields an average total utility of 1.0 for op- 
timizations that are 2.5x and 12. 5x costlier than those at 
which Regret generates utilities of 1.0, respectively. 

Summary. In summary, our mechanism-based ap- 
proaches not only guarantee truthfulness and cost-recovery 
but also yield utility that frequently exceeds that of Re- 
gret. Our approaches work especially well in scenarios where 
many users derive significant value from an optimization 
during the same time-slot. They under-perform compared 
to Regret in scenarios where users value the same optimiza- 
tion but during non-overlapping periods. 



*With mean 1.2, the maximum starting time slot of 6 users 
in 1000 runs was 12 as it is in case (a). 

'^ Regret is computed after every time slot hence it increases 
in discrete values. The difference in regret and the optimiza- 
tion cost is wasted value and is smaller for uniform arrival. 
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tutable optimization on total utility. (Section 7.6) 

8. RELATED WORK 

Today, cloud providers use two strategies for pricing op- 
timizations. In the first, the cost of the optimization is in- 
cluded in the base service price. For e.g., Amazon Sim- 
pleDB [9] automatically indexes user data and includes the 
corresponding overhead in the base-price computation (45 
bytes of extra storage are added to each item, attribute, and 
attribute- value). Similarly, SimpleDB and SQL Azure [26] 
automatically replicate data and include that cost in the 
base service cost. The key limitation with this approach is 
that the cloud must decide up-front what optimizations are 
worth offering and it forces users to pay for these optimiza- 
tions. In other cases, users choose desired optimizations and 
pay their exact cost. For example, in Amazon RDS [6] a user 
can choose to launch and pay-for a desired number of read- 
replicas to speed-up her query workload. This approach, 
however, works well only in the absence of collaborations. 

Significant recent work studies existing cloud pricing 
schemes, economic models, and their implications [24, 39, 
44]. In contrast we develop a new pricing mechanism. 

Most closely related to our work. Dash et at, developed an 
approach for pricing data structures (indexes, materialized 
views, etc.) in a DBMS cloud cache [16]. In their approach, 
the cloud selects the structures to build based on the notion 
of regret and its cost is amortized over the first A'^ queries 
that use it. To compute regret, the cloud relies on user sup- 
plied budget functions, that indicate their willingness to pay 
for various quality of service. In follow-up work Kantere et 
al. [22] tuned their approach and developed a regression- 
based technique to predict the extent of cost amortization. 
In contrast to our work, this previous approach relies on 
users being truthful and does not guarantee that the cost 
will be recovered. For example, consider a user who needs 
to run one, very expensive query over a private dataset. No 
structure will be implemented if she is truthful. Instead, she 
thus submits a large number of inexpensive queries over the 
same dataset while she expresses her willingness to pay zero 
for processing the extra queries, yet indicates a preference 
for low execution times over low costs. The regret-based 
approach will let her manually pick slow and cheap service 
for these queries. It will then compute the maximum pos- 
sible regret for the missing data structure that would have 
enabled faster plans for these queries. When the cloud accu- 
mulates enough regret, she can run the expensive query and 
pay a small fraction of the total cost of the optimization. 

Significant research applies economic principles to re- 
source allocation in distributed systems [1, 12, 13, 14, 18, 
34, 36, 43], collaboration promotion in peer-to-peer sys- 
tems [30, 29, 42], or more recently, VM allocation in the 
cloud [40]. We study how to choose and price optimizations 
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rather than allocate processing resources. The Mariposa dis- 
tributed database system [38] introduced a micro-economic 
paradigm for optimizing distributed query evaluation and 
data placement. This is a problem orthogonal to ours. 

We build on the Shapley Value Mechanism, which is an 
instance of Moulin Mechanisms [27] that have been designed 
for various ojfline combinatorial cost-sharing problems [32]. 
We design Moulin mechanisms in an online setting. 

Online mechanisms [31, Ch. 16] consider games where 
valuations come one at a time. While there is work on char- 
acterizing truthful mechanisms to maximize social utility in 
dynamic games [31, Thm. 16.17], to the best of our knowl- 
edge, no work applies to cost-sharing in dynamic games. 

9. CONCLUSIONS 

We studied how a cloud data service provider should ac- 
tivate and price optimizations that benefit many users. We 
have shown how the problem can be modeled as an instance 
of cost-recovery mechanism design. We also showed how the 
Shapley Value mechanism solves the problem of pricing a 
single optimization in an offline setting. We then developed 
a series of mechanisms that enable the pricing of either ad- 
ditive or substitutive optimizations in either an offline or an 
online game. We proved analytically that our mechanisms 
are truthful and cost-recovering. Through simulations, we 
demonstrated that our mechanisms also yield high utility 
compared with a regret-based state-of-the-art approach. 
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